{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Assignment 12"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.复习上课内容"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.回答以下理论问题"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 1. 请写一下TF-IDF的计算公式"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 2. LDA算法的基本假设是什么？"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3. 在TextRank算法中构建图的权重是如何得到的？"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 4. 什么是命名实体识别？ 有什么应用场景？"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 5.NLP主要有哪几类任务 ？"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.实践题"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3.1 手动实现TextRank算法 (在新闻数据中随机提取100条新闻训练词向量和做做法测试）"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " 提示：\n",
    " 1. 确定窗口，建立图链接。   \n",
    " 2. 通过词向量相似度确定图上边的权重\n",
    " 3. 根据公式实现算法迭代(d=0.85)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 选做 1.  提取新闻人物里的对话。(使用以上提取小数据即可）"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "提示：    \n",
    "1.寻找预料里具有表示说的意思。    \n",
    "2.使用语法分析提取句子结构。    \n",
    "3.检测谓语是否有表示说的意思。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 选择2. ： 电影评论分类。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在这个作业中你要完成一个电影评论分类任务。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1.数据获取。（采用爬虫技术爬取相关网页上的电影评论数据，例如猫眼电影评论，豆瓣电影评论）"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "2.把所获得数据分解为训练集，验证集和测试集。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "3.选用相应算法构建模型，并测试。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 选择3：文章自动续写"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在这个作业中你要完成一个文章自动续写的模型。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1.数据获取。（根据你的兴趣采用爬虫技术爬去相关网站上的文本数据内容：比如故事网站，小说网站等）"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "2.选取模型，并训练。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "3.展示一些你模型的输出例子。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
